Chapter 4: Exercises

Old Chinese version

(*)AM-GM inequality:

Give the formula to show the inequality of arithmetic and geometric means. When does the equality hold?
Prove the formula.

(*)MLE for an unfair coin: Toss an unfair coin 10 times and obtain 7 heads and 3 tails. Use the principle of MLE (maximum likelihood estimate) to derive the probabilities for head and tail are 0.7 and 0.3, respectively.
(*)MLE for a 3-sided dice: Toss a 3-sided dice 10 times and obtain 2 times for side 1, 3 times for side 2, 5 times for side 3. Use the principle of MLE (maximum likelihood estimate) to derive the probabilities for sides 1, 2, and 3 are 0.2, 0.3, and 0.5, respectively.
(*)1D Gaussian PDF and its MLE:

What is the formula for 1D Gaussian PDF (probability density function)?
Given a set of observation $\{x_1, x_2, ..., x_n\}$, what are the MLE (maximum likelihood estimate) of $\mu$ and $\sigma^2$ for the 1D Gaussian PDF?

(**)Multivariate Gaussian PDF and its MLE:

What is the formula for the multivariate Gaussian PDF (probability density function) in the d-dimensional space?
Given a set of observation $\{\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n\}$, what are the MLE (maximum likelihood estimate) of $\mathbf{\mu} \in R^d$ and $\Sigma \in R^{d \times d}$ for the multivariate Gaussian PDF?

(*)Function for computing the MLE of the multivariate Gaussian PDF: The multivariate Gaussiaon PDF in the d-dimensional space is given by $$ g(\mathbf{x}, \mathbf{\mu}, \Sigma)= \frac{1}{\sqrt{(2\pi)^d|\Sigma|}} e^{-(\mathbf{x-\mu})^T\Sigma^{-1}(\mathbf{x-\mu})/2}. $$ Write a function mle4gaussian.m with the following usage: where $X=\left[\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n \right]$ is a matrix with each column being an observation vector, and mu and sigma are the MLE for the multivariate Gaussian expressed by $$ \left\{ \begin{array}{l} \hat{\mathbf{\mu}}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i,\\ \hat{\Sigma}=\frac{1}{n}\sum_{i=1}^n \left(\mathbf{x}_i-\hat{\mathbf{\mu}}\right)\left(\mathbf{x}_i-\hat{\mathbf{\mu}}\right)^T.\\ \end{array} \right. $$ Note that you are not allowed to use any of the ready-to-use functions from any MATLAB toolbox. In other words, you need to write the function from scratch.
Here is a test case. If the input X is given by
Then the returned mu is And the returned sigma is
(*)Simplified formula of the MLE of ND Gaussian PDF: Prove that the MLE of $\Sigma$ of the multivariate Gaussian PDF can be expressed as follows: $$ \hat{\Sigma}=\frac{1}{n}\sum_{i=1}^n \left(\mathbf{x}_i-\hat{\mathbf{\mu}}\right)\left(\mathbf{x}_i-\hat{\mathbf{\mu}}\right)^T = \frac{1}{n}XX^T-\hat{\mathbf{\mu}}\hat{\mathbf{\mu}}^T, $$ where $X=\left[\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n \right]$ is a matrix with each column being an observation vector.
(*)Incremental formula for the MLE of ND Gaussian PDF: Given the observation matrix $X=\left[\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n \right]$ where each column is an observation vector, the MLE of a D-dim Gaussian PDF can be expressed as: $$ \left\{ \begin{array}{l} \hat{\mathbf{\mu}}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i,\\ \hat{\Sigma}=\frac{1}{n}XX^T-\hat{\mathbf{\mu}}\hat{\mathbf{\mu}}^T.\\ \end{array} \right. $$ We can use $\hat{\mathbf{\mu}}_{-k}$ and $\hat{\Sigma}_{-k}$ to denote the MLE after removing $\mathbf{x}_k$ from $X$, Prove that the MLE can be expressed as: $$ \left\{ \begin{array}{l} \hat{\mathbf{\mu}}_{-k}=\frac{n\hat{\mathbf{\mu}}-\mathbf{x}_k}{n-1},\\ \hat{\Sigma}_{-k}=\frac{n}{n-1}\hat{\Sigma}+\frac{n}{n-1}\hat{\mathbf{\mu}}\hat{\mathbf{\mu}}^T-\hat{\mathbf{\mu}}_{-k}\hat{\mathbf{\mu}}_{-k}^T -\frac{1}{n-1}\mathbf{x}_k\mathbf{x}_k^T.\\ \end{array} \right. $$
(*)ND Gaussian PDF with not-full covariance matrix: Given a set of observation $\{\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n\}$, we want to find the MLE of $\mathbf{\mu}$ and $\Sigma$ for the multivariate Gaussian PDF when $\Sigma$ assumes a specific non-full forms.

If $\Sigma$ is restricted to be diagonal, that is $$ \Sigma=diag(\alpha_1, \cdots, \alpha_d)= \left[ \begin{array}{ccc} \alpha_1 & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & \alpha_d \\ \end{array} \right] $$ Prove that the MLE of $\mathbf{\mu}$ and $\Sigma$ are $$ \left\{ \begin{array}{l} \hat{\mathbf{\mu}}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i,\\ \hat{\alpha}_k=\frac{1}{n}\sum_{i=1}^n \left(\mathbf{x}_i(k)-\hat{\mathbf{\mu}}(k)\right)^2, k=1, \ldots, d.\\ \end{array} \right. $$
If $\Sigma$ is restricted to be $\alpha I$ (a constant times an identity matrix), prove that the MLE of $\mathbf{\mu}$ and $\Sigma$ are $$ \left\{ \begin{array}{l} \hat{\mathbf{\mu}}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i,\\ \hat{\alpha}=\frac{1}{nd}\sum_{i=1}^n \left(\mathbf{x}_i-\hat{\mathbf{\mu}}\right)^T\left(\mathbf{x}_i-\hat{\mathbf{\mu}}\right).\\ \end{array} \right. $$

Data Clustering and Pattern Recognition (資料分群與樣式辨認)